Random Substitution-Insertion-Deletion (RSID) Model of Molecular Evolution with Alignment-free Parameter Estimation

نویسنده

  • David Koslicki
چکیده

We present a comprehensive new framework for handling biologically accurate models of molecular evolution. This model provides a systematic framework for studying models of molecular evolution that implement heterogeneous rates, conservation of reading frame, differing rates of insertion and deletion, customizable parametrization of the probabilities and types of substitutions, insertions, and deletions, as well as neighboring dependencies. We have stated the model in terms of an infinite state Markov chain in order to maximize the number of applicable theorems useful in the analysis of the model. We use such theorems to develop an alignment-free parameter estimation technique. This alignment-free technique circumvents many of the nuanced issues related to alignmentdependent estimation. We then apply an implementation of our model to reproduce (in a completely alignment-free fashion) some observed results of Zhang and Gerstein [29] regarding indel length distribution in human ribosomal protein pseudogenes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Twisted trees and inconsistency of tree estimation when gaps are treated as missing data - The impact of model mis-specification in distance corrections.

Statistically consistent estimation of phylogenetic trees or gene trees is possible if pairwise sequence dissimilarities can be converted to a set of distances that are proportional to the true evolutionary distances. Susko et al. (2004) reported some strikingly broad results about the forms of inconsistency in tree estimation that can arise if corrected distances are not proportional to the tr...

متن کامل

Estimation and reliability of molecular sequence alignments.

The problem of estimating the relatedness of a pair of biological sequences is addressed. A stochastic model of sequence evolution is described that allows insertion and deletion as well as replacement of amino acid residues (or substitution of nucleotides) over time. An expectation-maximization (EM) algorithm that obtains maximum likelihood estimates of the model parameters is introduced. The ...

متن کامل

Statistical alignment based on fragment insertion and deletion models

MOTIVATION The topic of this paper is the estimation of alignments and mutation rates based on stochastic sequence-evolution models that allow insertions and deletions of subsequences ('fragments') and not just single bases. The model we propose is a variant of a model introduced by Thorne et al., (J. Mol. Evol., 34, 3-16, 1992). The computational tractability of the model depends on certain re...

متن کامل

Generating benchmarks for multiple sequence alignments and phylogenetic reconstructions.

We present a new probabilistic model of evolution of RNA-, DNA-, or protein-like sequences and a tool rose that implements this model. By insertion, deletion and substitution of characters, a family of sequences is created from a common ancestor. During this artificial evolutionary process, the "true" history is logged and the "correct" multiple sequence alignment is created simultaneously. We ...

متن کامل

Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent

BackgroundMost statistical methods for phylogenetic estimation in use today treat a gap (generally representing an insertion or deletion, i.e., indel) within the input sequence alignment as missing data. However, the statistical properties of this treatment of indels have not been fully investigated.ResultsWe prove that maximum likelihood phylogeny estimation, treating indels as missing data, c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011